HIV Knowledge Among Youth (15-24)

Author

Gersende Joder

How much do young people around the world really know about HIV?
Are there notable differences between genders? Between regions of the world?
Do socio-economic factors influence HIV knowledge?

Despite decades of scientific progress, the HIV epidemic remains one of the most critical health concerns. Globally, close to 40 million people were living with HIV at the end of 2023, according to the World Health Organisation. The estimated number of HIV-related deaths for 2023 is 630,000. Effective disease prevention relies on robust educational agendas and easy access to testing. However, millions of youth possess incorrect or little information regarding HIV transmission and protection. This dashboard draws all its information from UNICEF and World Bank databases.

Code
from google.colab import drive
import geopandas as gpd

drive.mount('/content/drive')
shapefile_path = '/content/drive/MyDrive/data/map/ne_110m_admin_0_countries.shp'

world = gpd.read_file(shapefile_path)
Mounted at /content/drive
Code
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

df_knowledge = pd.read_csv('/content/drive/MyDrive/data/hiv_knowledge.csv', sep=';')

df_knowledge['year'] = pd.to_numeric(df_knowledge['year'], errors='coerce')
df_knowledge = df_knowledge.dropna(subset=['year'])
df_knowledge = df_knowledge.sort_values(['country', 'year'], ascending=[True, False])

if 'sex' in df_knowledge.columns:
    df_last = (
        df_knowledge.groupby(['country', 'year'])['obs_value']
        .mean()
        .reset_index()
        .sort_values(['country', 'year'], ascending=[True, False])
        .groupby('country')
        .first()
        .reset_index()
    )
else:
    df_last = df_knowledge.groupby('country').first().reset_index()

shapefile_path = '/content/drive/MyDrive/data/map/ne_110m_admin_0_countries.shp'
world = gpd.read_file(shapefile_path)

merged = world.merge(df_last, left_on='NAME', right_on='country', how='left')

purple_cmap = LinearSegmentedColormap.from_list("custom_purple", ["#5E17EB", "#D1BDFA"])

fig, ax = plt.subplots(1, 1, figsize=(20, 10))
merged.plot(
    column='obs_value',
    cmap=purple_cmap,
    linewidth=0.5,
    edgecolor='0.8',
    missing_kwds={"color": "lightgrey", "label": "No data"},
    legend=True,
    ax=ax
)
ax.set_title('Accurate HIV Knowledge Among Youth (15–24): Regional Discrepancies(2000-2023)',
             fontsize=20, weight='bold')
ax.text(
    0.5, 1.0,
    "% of youth (15-24) with comprehensive, correct HIV knowledge, by country (latest available year)",
    fontsize=16, color='dimgray', ha='center', va='top',
    transform=ax.transAxes
)
ax.axis('off')
plt.show()

Code
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/data/hiv_knowledge.csv', sep=';')

region_map = {
    # South Asia
    "Afghanistan": "South Asia",
    "Bangladesh": "South Asia",
    "India": "South Asia",
    "Maldives": "South Asia",
    "Nepal": "South Asia",
    "Pakistan": "South Asia",

    # Eastern Europe & Central Asia
    "Armenia": "Eastern Europe & Central Asia",
    "Azerbaijan": "Eastern Europe & Central Asia",
    "Ukraine": "Eastern Europe & Central Asia",
    "Tajikistan": "Eastern Europe & Central Asia",

    # Europe
    "Albania": "Europe",

    # Middle East & North Africa (MENA)
    "Egypt": "Middle East & North Africa (MENA)",
    "Jordan": "Middle East & North Africa (MENA)",
    "Lebanon": "Middle East & North Africa (MENA)",
    "Morocco": "Middle East & North Africa (MENA)",

    # Sub-Saharan Africa
    "Benin": "Sub-Saharan Africa",
    "Burkina Faso": "Sub-Saharan Africa",
    "Burundi": "Sub-Saharan Africa",
    "Cameroon": "Sub-Saharan Africa",
    "Comoros": "Sub-Saharan Africa",
    "Chad": "Sub-Saharan Africa",
    "Congo": "Sub-Saharan Africa",
    "Congo, the Democratic Republic of": "Sub-Saharan Africa",
    "Ethiopia": "Sub-Saharan Africa",
    "Gabon": "Sub-Saharan Africa",
    "Gambia": "Sub-Saharan Africa",
    "Ghana": "Sub-Saharan Africa",
    "Guinea": "Sub-Saharan Africa",
    "Ivory Coast": "Sub-Saharan Africa",
    "Kenya": "Sub-Saharan Africa",
    "Lesotho": "Sub-Saharan Africa",
    "Madagascar": "Sub-Saharan Africa",
    "Malawi": "Sub-Saharan Africa",
    "Mali": "Sub-Saharan Africa",
    "Mauritania": "Sub-Saharan Africa",
    "Mozambique": "Sub-Saharan Africa",
    "Namibia": "Sub-Saharan Africa",
    "Nigeria": "Sub-Saharan Africa",
    "Rwanda": "Sub-Saharan Africa",
    "Sao Tome and Principe": "Sub-Saharan Africa",
    "Senegal": "Sub-Saharan Africa",
    "Sierra Leone": "Sub-Saharan Africa",
    "Somalia": "Sub-Saharan Africa",
    "Swaziland": "Sub-Saharan Africa",
    "Tanzania, United Republic of": "Sub-Saharan Africa",
    "Togo": "Sub-Saharan Africa",
    "Uganda": "Sub-Saharan Africa",
    "Zambia": "Sub-Saharan Africa",
    "Zimbabwe": "Sub-Saharan Africa",

    # Latin America & Caribbean
    "Bolivia, Plurinational State of": "Latin America & Caribbean",
    "Colombia": "Latin America & Caribbean",
    "Dominican Republic": "Latin America & Caribbean",
    "Guyana": "Latin America & Caribbean",
    "Haiti": "Latin America & Caribbean",
    "Honduras": "Latin America & Caribbean",
    "Nicaragua": "Latin America & Caribbean",
    "Peru": "Latin America & Caribbean",

    # East Asia & Pacific
    "Cambodia": "East Asia & Pacific",
    "Indonesia": "East Asia & Pacific",
    "Myanmar": "East Asia & Pacific",
    "Philippines": "East Asia & Pacific",
    "Timor-Leste": "East Asia & Pacific",
    "Papua New Guinea": "East Asia & Pacific",
}

df_sorted = df.sort_values(['country', 'year'], ascending=[True, False])
df_last_year = df_sorted.groupby(['country', 'sex']).first().reset_index()

df_last = (
    df_last_year.groupby('country')['obs_value'].mean().reset_index()
)

df_bar = df_last.nsmallest(15, 'obs_value').copy()

df_bar['region'] = df_bar['country'].map(region_map)

region_colors = {
    "South Asia": "#59AD65",
    "Eastern Europe & Central Asia": "#4D9DAB",
    "Europe": "#F494C2",
    "Middle East & North Africa (MENA)": "#FDCC9E",
    "Sub-Saharan Africa": "#FF9D4B",
    "Latin America & Caribbean": "#6EB0A3",
    "East Asia & Pacific": "#98DF8C"
}

df_bar = df_bar.sort_values('obs_value', ascending=False)
df_bar['country'] = pd.Categorical(df_bar['country'], categories=df_bar['country'], ordered=True)

from plotnine import *
chart = (
    ggplot(df_bar, aes(x='country', y='obs_value', fill='region'))
    + geom_col(width=0.7)
    + geom_text(
    aes(label=df_bar['obs_value'].map(lambda x: f"{x:.1f}")),
    ha='left',
    nudge_y=0.2,
    size=9,
    color="black"
)
    + scale_fill_manual(values=region_colors)
    + coord_flip()
    + labs(
        title="Falling Behind: Where Youth Are Least Informed",
        subtitle="% of youth (15-24) with comprehensive, correct HIV knowledge, by country (latest available year)",
        x="",
        y="% of Youth with Comprehensive, Correct HIV Knowledge",
        fill="Region"
    )
    + scale_y_continuous(expand=(0,0), limits=(0, df_bar['obs_value'].max() + 2))
    + theme_minimal()
    + theme(
        plot_title=element_text(weight='bold', size=14, ha='left'),
        plot_subtitle=element_text(size=8.5, margin={'b': 12}),
        axis_text=element_text(size=10),
        axis_title_y=element_text(size=10, margin={'r': 10}),
        figure_size=(7,6),
        panel_grid_major_x=element_line(color="#eeeeee"),
        panel_grid_major_y=element_blank(),
        panel_grid_minor=element_blank(),
        panel_background=element_rect(fill='white'),
        legend_position='none'
    )
)
chart

📍Big gaps, low knowledge

HIV knowledge varies widely across the globe. Among the examined population, young people with a comprehensive, correct knowledge of HIV averaged 30% from 2013 to 2023.

Key insight: Sub-Saharian Africa displays a large range of results, with some countries exceeding 50% of knowledge rates, while others remain critically low.

The bottom 15 performers demonstrate that in many countries, less than 1 in 10 young people succeed to answer HIV-related questions correctly. Misinformation and silence don’t know boundaries. But neither should education.

Code
from IPython.display import display, HTML

html_code = """
<a href="" target="_blank">
    <img src="/content/drive/MyDrive/images/image.png" width="1000"/>
</a>
"""
display(HTML(html_code))
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('/content/drive/MyDrive/data/hiv_knowledge.csv', sep=';')

region_map = {
    # South Asia
    "Afghanistan": "South Asia",
    "Bangladesh": "South Asia",
    "India": "South Asia",
    "Maldives": "South Asia",
    "Nepal": "South Asia",
    "Pakistan": "South Asia",

    # Eastern Europe & Central Asia
    "Armenia": "Eastern Europe & Central Asia",
    "Azerbaijan": "Eastern Europe & Central Asia",
    "Ukraine": "Eastern Europe & Central Asia",
    "Tajikistan": "Eastern Europe & Central Asia",

    # Europe
    "Albania": "Europe",

    # Middle East & North Africa (MENA)
    "Egypt": "Middle East & North Africa (MENA)",
    "Jordan": "Middle East & North Africa (MENA)",
    "Lebanon": "Middle East & North Africa (MENA)",
    "Morocco": "Middle East & North Africa (MENA)",

    # Sub-Saharan Africa
    "Benin": "Sub-Saharan Africa",
    "Burkina Faso": "Sub-Saharan Africa",
    "Burundi": "Sub-Saharan Africa",
    "Cameroon": "Sub-Saharan Africa",
    "Comoros": "Sub-Saharan Africa",
    "Chad": "Sub-Saharan Africa",
    "Congo": "Sub-Saharan Africa",
    "Congo, the Democratic Republic of": "Sub-Saharan Africa",
    "Ethiopia": "Sub-Saharan Africa",
    "Gabon": "Sub-Saharan Africa",
    "Gambia": "Sub-Saharan Africa",
    "Ghana": "Sub-Saharan Africa",
    "Guinea": "Sub-Saharan Africa",
    "Ivory Coast": "Sub-Saharan Africa",
    "Kenya": "Sub-Saharan Africa",
    "Lesotho": "Sub-Saharan Africa",
    "Madagascar": "Sub-Saharan Africa",
    "Malawi": "Sub-Saharan Africa",
    "Mali": "Sub-Saharan Africa",
    "Mauritania": "Sub-Saharan Africa",
    "Mozambique": "Sub-Saharan Africa",
    "Namibia": "Sub-Saharan Africa",
    "Nigeria": "Sub-Saharan Africa",
    "Rwanda": "Sub-Saharan Africa",
    "Sao Tome and Principe": "Sub-Saharan Africa",
    "Senegal": "Sub-Saharan Africa",
    "Sierra Leone": "Sub-Saharan Africa",
    "Somalia": "Sub-Saharan Africa",
    "Swaziland": "Sub-Saharan Africa",
    "Tanzania, United Republic of": "Sub-Saharan Africa",
    "Togo": "Sub-Saharan Africa",
    "Uganda": "Sub-Saharan Africa",
    "Zambia": "Sub-Saharan Africa",
    "Zimbabwe": "Sub-Saharan Africa",

    # Latin America & Caribbean
    "Bolivia, Plurinational State of": "Latin America & Caribbean",
    "Colombia": "Latin America & Caribbean",
    "Dominican Republic": "Latin America & Caribbean",
    "Guyana": "Latin America & Caribbean",
    "Haiti": "Latin America & Caribbean",
    "Honduras": "Latin America & Caribbean",
    "Nicaragua": "Latin America & Caribbean",
    "Peru": "Latin America & Caribbean",

    # East Asia & Pacific
    "Cambodia": "East Asia & Pacific",
    "Indonesia": "East Asia & Pacific",
    "Myanmar": "East Asia & Pacific",
    "Philippines": "East Asia & Pacific",
    "Timor-Leste": "East Asia & Pacific",
    "Papua New Guinea": "East Asia & Pacific",
}

df['region'] = df['country'].map(region_map)
df = df[df['region'].notna()].copy()

df_avg = (
    df.groupby(['country', 'region', 'year'])['obs_value']
    .mean().reset_index()
)

df_region_year = (
    df_avg.groupby(['region', 'year'])['obs_value']
    .mean().reset_index()
    .rename(columns={'obs_value': 'hiv_knowledge_mean'})
)

df_region_year['hiv_knowledge_ma'] = (
    df_region_year
    .sort_values(['region', 'year'])
    .groupby('region')['hiv_knowledge_mean']
    .transform(lambda x: x.rolling(window=5, min_periods=1, center=True).mean())
)

region_colors = {
    "South Asia": "#59AD65",
    "Eastern Europe & Central Asia": "#4D9DAB",
    "Middle East & North Africa (MENA)": "#FDCC9E",
    "Sub-Saharan Africa": "#FF9D4B",
    "Latin America & Caribbean": "#6EB0A3",
    "East Asia & Pacific": "#98DF8C"
}

plt.figure(figsize=(12, 6))
for region, color in region_colors.items():
    data = df_region_year[df_region_year['region'] == region]
    plt.plot(data['year'], data['hiv_knowledge_mean'], color=color, alpha=0.3, linestyle='--')
    plt.plot(data['year'], data['hiv_knowledge_ma'], label=region, color=color, linewidth=2.5, marker='o')

plt.title('Evolution of HIV Knowledge Among Youth (15–24) by Region (5-year Moving Average)', fontsize=16, weight='bold')
plt.xlabel('Year', fontsize=14)
plt.ylabel('% of Youth with Comprehensive, Correct HIV Knowledge', fontsize=12)
plt.legend(title="Region", fontsize=11)
plt.ylim(0, 60)
plt.grid(axis='y', alpha=0.2)
plt.tight_layout()
plt.show()

📈 Progress but persistent gaps

Over the past two decades, comprehensive HIV knowledge among youth has increased in most regions. However, progress remains uneven: while South Asia and Latin America show notable improvements, Sub-Saharan Africa and the Middle East & North Africa still lag behind.

🌍 Key takeaway : Despite global advances, major disparities persist — highlighting the ongoing need for targeted education and outreach efforts.

Code
from plotnine import *

df_sorted = df.sort_values(['country', 'sex', 'year'], ascending=[True, True, False])
df_last = df_sorted.groupby(['country', 'sex']).first().reset_index()

df_grouped = df_last.groupby(['region', 'sex'], as_index=False)['obs_value'].mean()

region_order = [
    'East Asia & Pacific',
    'Eastern Europe & Central Asia',
    'Europe',
    'Latin America & Caribbean',
    'Middle East & North Africa (MENA)',
    'South Asia',
    'Sub-Saharan Africa',
    'Other'
]
df_grouped['region'] = pd.Categorical(df_grouped['region'], categories=region_order, ordered=True)

sex_palette = {'Female': '#D3B4F6', 'Male': '#5CD8D2'}

chart = (
    ggplot(df_grouped, aes(x='region', y='obs_value', fill='sex'))
    + geom_col(position='dodge', width=0.7)
    + scale_fill_manual(values=sex_palette)
    + labs(
        title="HIV Knowledge Among Youth (15–24) by Gender and Region",
        x="",
        y="% of Youth with Comprehensive, Correct HIV Knowledge"
    )
    + scale_y_continuous(labels=lambda l: [f"{v:.0f}%" for v in l], expand=(0,0), limits=(0, 40))
    + theme_minimal()
    + theme(
        plot_title=element_text(weight='bold', size=14, ha='left'),
        axis_text_x=element_text(size=11, rotation=20, ha='right'),
        axis_text_y=element_text(size=11),
        legend_position='top',
        figure_size=(8,6),
        panel_grid_major_x=element_blank(),
        panel_grid_minor=element_blank(),
        panel_background=element_rect(fill='white')
    )
)
chart

🚻 Gender Gaps Persist

All regions demonstrate a recurring yet concerning pattern: Young women report lower HIV knowledge levels than young men.

The knowledge gap about HIV is wider in regions with broader gender inequality. While educational campaigns may reach schools, they may not reach all members of society equally.

Code
import pandas as pd
from plotnine import *

df = pd.read_csv('/content/drive/MyDrive/data/hiv_knowledge.csv', sep=';')
meta = pd.read_csv('/content/drive/MyDrive/data/metadata.csv', sep=';')

region_map = {
    # South Asia
    "Afghanistan": "South Asia",
    "Bangladesh": "South Asia",
    "India": "South Asia",
    "Maldives": "South Asia",
    "Nepal": "South Asia",
    "Pakistan": "South Asia",

    # Eastern Europe & Central Asia
    "Armenia": "Eastern Europe & Central Asia",
    "Azerbaijan": "Eastern Europe & Central Asia",
    "Ukraine": "Eastern Europe & Central Asia",
    "Tajikistan": "Eastern Europe & Central Asia",

    # Europe
    "Albania": "Europe",

    # Middle East & North Africa (MENA)
    "Egypt": "Middle East & North Africa (MENA)",
    "Jordan": "Middle East & North Africa (MENA)",
    "Lebanon": "Middle East & North Africa (MENA)",
    "Morocco": "Middle East & North Africa (MENA)",

    # Sub-Saharan Africa
    "Benin": "Sub-Saharan Africa",
    "Burkina Faso": "Sub-Saharan Africa",
    "Burundi": "Sub-Saharan Africa",
    "Cameroon": "Sub-Saharan Africa",
    "Comoros": "Sub-Saharan Africa",
    "Chad": "Sub-Saharan Africa",
    "Congo": "Sub-Saharan Africa",
    "Congo, the Democratic Republic of": "Sub-Saharan Africa",
    "Ethiopia": "Sub-Saharan Africa",
    "Gabon": "Sub-Saharan Africa",
    "Gambia": "Sub-Saharan Africa",
    "Ghana": "Sub-Saharan Africa",
    "Guinea": "Sub-Saharan Africa",
    "Ivory Coast": "Sub-Saharan Africa",
    "Kenya": "Sub-Saharan Africa",
    "Lesotho": "Sub-Saharan Africa",
    "Madagascar": "Sub-Saharan Africa",
    "Malawi": "Sub-Saharan Africa",
    "Mali": "Sub-Saharan Africa",
    "Mauritania": "Sub-Saharan Africa",
    "Mozambique": "Sub-Saharan Africa",
    "Namibia": "Sub-Saharan Africa",
    "Nigeria": "Sub-Saharan Africa",
    "Rwanda": "Sub-Saharan Africa",
    "Sao Tome and Principe": "Sub-Saharan Africa",
    "Senegal": "Sub-Saharan Africa",
    "Sierra Leone": "Sub-Saharan Africa",
    "Somalia": "Sub-Saharan Africa",
    "Swaziland": "Sub-Saharan Africa",
    "Tanzania, United Republic of": "Sub-Saharan Africa",
    "Togo": "Sub-Saharan Africa",
    "Uganda": "Sub-Saharan Africa",
    "Zambia": "Sub-Saharan Africa",
    "Zimbabwe": "Sub-Saharan Africa",

    # Latin America & Caribbean
    "Bolivia, Plurinational State of": "Latin America & Caribbean",
    "Colombia": "Latin America & Caribbean",
    "Dominican Republic": "Latin America & Caribbean",
    "Guyana": "Latin America & Caribbean",
    "Haiti": "Latin America & Caribbean",
    "Honduras": "Latin America & Caribbean",
    "Nicaragua": "Latin America & Caribbean",
    "Peru": "Latin America & Caribbean",

    # East Asia & Pacific
    "Cambodia": "East Asia & Pacific",
    "Indonesia": "East Asia & Pacific",
    "Myanmar": "East Asia & Pacific",
    "Philippines": "East Asia & Pacific",
    "Timor-Leste": "East Asia & Pacific",
    "Papua New Guinea": "East Asia & Pacific",
}

df_avg = df_avg.dropna(subset=['gdp_per_capita', 'region'])

df_avg = df_avg[df_avg['region'] != "Europe"]

region_colors = {
    "South Asia": "#59AD65",
    "Eastern Europe & Central Asia": "#4D9DAB",
    "Middle East & North Africa (MENA)": "#FDCC9E",
    "Sub-Saharan Africa": "#FF9D4B",
    "Latin America & Caribbean": "#6EB0A3",
    "East Asia & Pacific": "#98DF8C"
}

chart = (
    ggplot(df_avg, aes(x='gdp_per_capita', y='obs_value', color='region'))
    + geom_point(size=3, alpha=0.85)
    + geom_smooth(aes(group='region'), method='lm', se=False, size=1.2)
    + scale_color_manual(values=region_colors)
    + scale_x_continuous(labels=lambda l: [f"${int(x/1000)}K" for x in l])
    + scale_y_continuous(labels=lambda l: [f"{int(x)}%" for x in l], limits=(0, 60))
    + labs(
        x="GDP per capita (constant 2015 US$)",
        y="% of Youth with Comprehensive, Correct HIV Knowledge",
        title="HIV Knowledge and GDP Per Capita: A Weak Link",
        subtitle="% of youth (15-24) with comprehensive, correct HIV knowledge, by region (latest available year)",
    )
    + theme_minimal()
    + theme(
        plot_title=element_text(weight='bold', size=18),
        axis_text_x=element_text(size=12),
        axis_text_y=element_text(size=12),
        legend_title=element_blank(),
        legend_position='top',
        figure_size=(8,5),
        panel_grid_minor=element_blank()
    )
)
chart

💸 Income doesn’t guarantee awareness

Is HIV linked to national wealth ? Not necessarily.

While clichés may convey a strong correlation in that regard, in reality some low-income countries outperform richer nations in HIV awareness, like Burundi. Meanwhile, countries in East Asia observe a stagnant and almost declining trend, despite higher income.

These observations highlight the importance of providing targeted educational campaigns together with effective national public health efforts rather than economic wealth alone.

Code
import pandas as pd
import numpy as np

df_knowledge = pd.read_csv('/content/drive/MyDrive/data/hiv_knowledge.csv', sep=';')
df_testing = pd.read_csv('/content/drive/MyDrive/data/hiv_testing.csv', sep=';')

region_map = {
    # South Asia
    "Afghanistan": "South Asia",
    "Bangladesh": "South Asia",
    "India": "South Asia",
    "Maldives": "South Asia",
    "Nepal": "South Asia",
    "Pakistan": "South Asia",

    # Eastern Europe & Central Asia
    "Armenia": "Eastern Europe & Central Asia",
    "Azerbaijan": "Eastern Europe & Central Asia",
    "Ukraine": "Eastern Europe & Central Asia",
    "Tajikistan": "Eastern Europe & Central Asia",

    # Europe
    "Albania": "Europe",

    # Middle East & North Africa (MENA)
    "Egypt": "Middle East & North Africa (MENA)",
    "Jordan": "Middle East & North Africa (MENA)",
    "Lebanon": "Middle East & North Africa (MENA)",
    "Morocco": "Middle East & North Africa (MENA)",

    # Sub-Saharan Africa
    "Benin": "Sub-Saharan Africa",
    "Burkina Faso": "Sub-Saharan Africa",
    "Burundi": "Sub-Saharan Africa",
    "Cameroon": "Sub-Saharan Africa",
    "Comoros": "Sub-Saharan Africa",
    "Chad": "Sub-Saharan Africa",
    "Congo": "Sub-Saharan Africa",
    "Congo, the Democratic Republic of": "Sub-Saharan Africa",
    "Ethiopia": "Sub-Saharan Africa",
    "Gabon": "Sub-Saharan Africa",
    "Gambia": "Sub-Saharan Africa",
    "Ghana": "Sub-Saharan Africa",
    "Guinea": "Sub-Saharan Africa",
    "Ivory Coast": "Sub-Saharan Africa",
    "Kenya": "Sub-Saharan Africa",
    "Lesotho": "Sub-Saharan Africa",
    "Madagascar": "Sub-Saharan Africa",
    "Malawi": "Sub-Saharan Africa",
    "Mali": "Sub-Saharan Africa",
    "Mauritania": "Sub-Saharan Africa",
    "Mozambique": "Sub-Saharan Africa",
    "Namibia": "Sub-Saharan Africa",
    "Nigeria": "Sub-Saharan Africa",
    "Rwanda": "Sub-Saharan Africa",
    "Sao Tome and Principe": "Sub-Saharan Africa",
    "Senegal": "Sub-Saharan Africa",
    "Sierra Leone": "Sub-Saharan Africa",
    "Somalia": "Sub-Saharan Africa",
    "Swaziland": "Sub-Saharan Africa",
    "Tanzania, United Republic of": "Sub-Saharan Africa",
    "Togo": "Sub-Saharan Africa",
    "Uganda": "Sub-Saharan Africa",
    "Zambia": "Sub-Saharan Africa",
    "Zimbabwe": "Sub-Saharan Africa",

    # Latin America & Caribbean
    "Bolivia, Plurinational State of": "Latin America & Caribbean",
    "Colombia": "Latin America & Caribbean",
    "Dominican Republic": "Latin America & Caribbean",
    "Guyana": "Latin America & Caribbean",
    "Haiti": "Latin America & Caribbean",
    "Honduras": "Latin America & Caribbean",
    "Nicaragua": "Latin America & Caribbean",
    "Peru": "Latin America & Caribbean",

    # East Asia & Pacific
    "Cambodia": "East Asia & Pacific",
    "Indonesia": "East Asia & Pacific",
    "Myanmar": "East Asia & Pacific",
    "Philippines": "East Asia & Pacific",
    "Timor-Leste": "East Asia & Pacific",
    "Papua New Guinea": "East Asia & Pacific",
}

df_knowledge['region'] = df_knowledge['country'].map(region_map)
df_testing['region'] = df_testing['country'].map(region_map)

df_knowledge = df_knowledge[df_knowledge['region'].notna()]
df_testing = df_testing[df_testing['region'].notna()]

df_knowledge_sorted = df_knowledge.sort_values(['country', 'sex', 'year'], ascending=[True, True, False])
last_knowledge = df_knowledge_sorted.groupby(['country', 'sex']).head(1).copy()

df_testing_sorted = df_testing.sort_values(['country', 'sex', 'year'], ascending=[True, True, False])
last_testing = df_testing_sorted.groupby(['country', 'sex']).head(1).copy()

last_knowledge_avg = last_knowledge.groupby(['country', 'region'])['obs_value'].mean().reset_index()
last_testing_avg = last_testing.groupby(['country', 'region'])['testing_knowledge'].mean().reset_index()

df_last = pd.merge(last_knowledge_avg, last_testing_avg, on=['country', 'region'], how='inner')

df_regional_last = (
    df_last.groupby('region')[['testing_knowledge', 'obs_value']]
    .mean()
    .reset_index()
)
Code
import matplotlib.pyplot as plt
import numpy as np

region_colors = {
    "South Asia": "#59AD65",
    "Eastern Europe & Central Asia": "#4D9DAB",
    "Middle East & North Africa (MENA)": "#FDCC9E",
    "Sub-Saharan Africa": "#FF9D4B",
    "Latin America & Caribbean": "#6EB0A3",
    "East Asia & Pacific": "#98DF8C"
}

df = df_regional_last[df_regional_last['region'].isin(region_colors.keys())].copy()

plt.figure(figsize=(9, 6))

for region in df['region']:
    plt.scatter(
        df.loc[df['region'] == region, 'testing_knowledge'],
        df.loc[df['region'] == region, 'obs_value'],
        color=region_colors[region],
        s=220,
        label=region,
        edgecolors='k',
        alpha=0.9
    )

x = df['testing_knowledge']
y = df['obs_value']
slope, intercept = np.polyfit(x, y, 1)
trend_x = np.linspace(0, 100, 200)
trend_y = slope * trend_x + intercept
plt.plot(trend_x, trend_y, color='gray', linewidth=2.2, alpha=0.7, zorder=0)

plt.title(
    "HIV Knowledge & Testing Awareness: Are Youth Well-Informed? (latest available year)",
    fontsize=14, weight='bold', pad=22
)
plt.xlabel("% of Youth (15–24) Who Know Where to Get Tested for HIV", fontsize=12)
plt.ylabel("% of Youth with Comprehensive, Correct HIV Knowledge", fontsize=12)

plt.xlim(0, 100)
plt.ylim(0, 60)
plt.xticks(np.arange(0, 101, 10), [f"{int(x)}%" for x in np.arange(0, 101, 10)])
plt.yticks(np.arange(0, 61, 10), [f"{int(y)}%" for y in np.arange(0, 61, 10)])
plt.grid(True, which='major', linestyle='--', alpha=0.5)

handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(
    by_label.values(), by_label.keys(),
    title="Region", loc='upper left', fontsize=12, title_fontsize=12,
    frameon=True, bbox_to_anchor=(0.01, 0.98), borderpad=0.9
)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.show()

🧪 Infrastructure Versus Information 🧠

Many countries have invested in making HIV testing easily accessible. However… Knowing where to get tested for HIV may not necessarily mean accurately understanding this virus.

The visualisation highlights a gap between health infrastructure and health literacy. Time to pair testing with teaching. Let’s get studying ! 📚

🧩 What does it mean ?

Although information is increasingly available, only certain youth groups access it and in varying depths of understanding. Improving knowledge among youth is not just a matter of checking the right boxes - it’s a prevention strategy.

By closing gender gaps, strengthening education strategies and promoting safe practices, we will be able to protect youth worldwide from contracting and propagating this virus.

🚀 Going Forward

While progress was definitely achieved in the fight against HIV, yet discrimination together with stigma and false information continues to endanger millions - especially marginalized youth and LGBTQIA+ communities.

Thanks to major scientific advancements such as antiretroviral therapy (ART), millions of people living with HIV can live long and healthy lives as the treatment makes the viral load undetectable and prevents HIV transmission. Additionaly, new HIV prevention methods such as self-tests and PrEP support a positive shift towards the elimination of the epidemic. However, access to these methods remains limited in certain regions of the world. Let’s ensure treatment and education reach everyone, no exceptions.

For more interactive information, find my Tableau report here.

🤝 Want to Help ? Find details about UNICEF’s HIV & AIDS program here.

Code
from IPython.display import display, HTML

html_code = """
<a href="https://www.unicef.ie/donate/?utm_source=unicef.org.referral&utm_medium=donatelink&utm_content=donate&utm_campaign=unicef.org&_gl=1*1lofuyv*_gcl_au*MTQ0MjIxOTg0My4xNzQyOTE1MDcy*_ga*Mzc3NjU5NDY4LjE3NDI5MTUwNzI.*_ga_P0DMSZ8KY6*MTc0MzAwODg1OS4yLjEuMTc0MzAwOTc3OS42MC4wLjA.*_ga_ZEPV2PX419*MTc0MzAwODg1OS4yLjEuMTc0MzAwOTc3OS42MC4wLjA.*_ga_BCSVVE74RB*MTc0MzAwODg1OS4yLjEuMTc0MzAwOTc3OS42MC4wLjA.#1" target="_blank">
    <img src="/content/drive/MyDrive/images/donate.png" width="200"/>
</a>
"""
display(HTML(html_code))